1 Data

Information about the test:

question description MATH_group
A1 properties of fractions A
A2 find intersection of lines A
A3 composition of functions B
A4 completing the square A
A5 trig double angle formula A
A6 trig wave function A
A7 graphical vector sum B
A8 compute angle between 3d vectors A
A9 simplify logs A
A10 identify graph of rational functions B
A11 summing arithmetic progression A
A12 find point with given slope A
A13 equation of tangent A
A14 find minimum gradient of cubic B
A15 find and classify stationary points of cubic A
A16 trig chain rule A
A17 chain rule A
A18 definite integral A
A19 area between curve and x-axis (in 2 parts) B
A20 product rule with given values B

Load the student scores for the test:

Show data summary
test_scores %>% skim()
Data summary
Name Piped data
Number of rows 3496
Number of columns 23
_______________________
Column type frequency:
character 2
numeric 21
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
year 0 1 7 7 0 4 0
AnonID 0 1 9 9 0 3472 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Total 0 1 67.85 23.05 0 57.00 72.5 85 100 ▁▁▃▇▇
A1 0 1 4.25 1.51 0 5.00 5.0 5 5 ▁▁▂▁▇
A2 0 1 4.23 1.73 0 5.00 5.0 5 5 ▁▁▁▁▇
A3 0 1 3.20 2.40 0 0.00 5.0 5 5 ▅▁▁▁▇
A4 0 1 3.68 1.56 0 3.00 4.0 5 5 ▂▂▂▃▇
A5 0 1 3.97 2.02 0 5.00 5.0 5 5 ▂▁▁▁▇
A6 0 1 1.58 1.89 0 0.00 0.0 2 5 ▇▅▁▁▃
A7 0 1 3.45 2.24 0 0.00 5.0 5 5 ▃▁▁▁▇
A8 0 1 3.03 2.44 0 0.00 5.0 5 5 ▅▁▁▁▇
A9 0 1 4.16 1.87 0 5.00 5.0 5 5 ▂▁▁▁▇
A10 0 1 3.14 2.08 0 2.50 5.0 5 5 ▃▁▃▁▇
A11 0 1 3.89 1.67 0 2.50 5.0 5 5 ▁▁▂▁▇
A12 0 1 3.78 2.06 0 4.00 5.0 5 5 ▃▁▁▁▇
A13 0 1 3.53 2.05 0 2.00 5.0 5 5 ▂▂▁▁▇
A14 0 1 2.50 1.99 0 0.00 2.0 5 5 ▇▇▁▁▇
A15 0 1 3.35 2.13 0 1.25 5.0 5 5 ▃▁▂▁▇
A16 0 1 4.23 1.81 0 5.00 5.0 5 5 ▂▁▁▁▇
A17 0 1 4.36 1.67 0 5.00 5.0 5 5 ▁▁▁▁▇
A18 0 1 3.33 2.36 0 0.00 5.0 5 5 ▅▁▁▁▇
A19 0 1 2.33 2.49 0 0.00 0.0 5 5 ▇▁▁▁▇
A20 0 1 1.86 2.42 0 0.00 0.0 5 5 ▇▁▁▁▅

1.1 Data cleaning

Included in the data are many abandoned attempts, where students have apparently not engaged with most questions (e.g. after the first few on the test). Unfortunately the data only includes a score for each item, and not whether it was actually answered, so to try to remove these “non-serious” attempts, we use a process of eliminating based on scores in the latter half of the test:

  1. For students who took the test more than once, consider the attempt with the highest scores only and remove the others;

  2. Eliminate the students who scored three or more zeros in the 5 easiest questions in the second-half of the test; and

  3. Add the students scoring more than 30 marks in total back to the sample.

test_scores <- test_scores_unfiltered %>% 
  group_by(AnonID) %>% 
  slice_max(Total, n = 1) %>% 
  ungroup() %>% 
  rowwise() %>% 
  mutate(zeros_in_easiest_5 = sum(A11==0, A12==0, A13==0, A16==0, A17==0)) %>% 
  filter(zeros_in_easiest_5 <= 2 | Total >= 30) %>% 
  select(-zeros_in_easiest_5) %>% 
  ungroup()

bind_rows(
  "unfiltered" = test_scores_unfiltered %>% select(Total),
  "filtered" = test_scores %>% select(Total),
  .id = "dataset"
) %>% 
  group_by(dataset) %>% 
  # add n's to the facet titles
  mutate(dataset = str_glue("{dataset} (n={n()})") %>% as_factor()) %>% 
  # flip them so that filtered appears on the right
  mutate(dataset = fct_rev(dataset)) %>% 
  ggplot(aes(x = Total)) +
  geom_histogram() +
  facet_wrap(vars(dataset)) +
  theme_minimal()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
Distribution of scores, filtered vs unfiltered

Distribution of scores, filtered vs unfiltered

1.2 Data summary

Summary of each cohort:

test_scores_summary <- test_scores %>% 
  group_by(year) %>% 
  summarise(
    n = n(),
    mean = mean(Total),
    sd = sd(Total),
    median = median(Total)
  )

test_scores_summary %>% 
  gt() %>% 
  fmt_number(columns = c("mean", "sd"), decimals = 2) %>%
  data_color(
    columns = c("n"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  )
year n mean sd median
2013-14 816 69.77 16.94 71.0
2014-15 924 70.30 16.71 71.5
2015-16 721 73.28 17.05 76.0
2016-17 760 77.25 15.31 79.5
test_scores %>% 
  ggplot(aes(x = Total)) +
  ggridges::geom_density_ridges(aes(y = year, fill = year)) +
  #facet_grid(cols = vars(year)) +
  theme_minimal()
## Picking joint bandwidth of 3.9

p1 <- test_scores %>% 
  ggplot(aes(x = Total)) +
  geom_histogram(binwidth = 5) +
  #scale_x_continuous(limits = c(0,100), breaks = c(0, 50, 100)) +
  facet_grid(cols = vars(year)) +
  theme_minimal() +
  labs(x = "Total score (out of 100)",
       y = "Number of students",
       title = "Edinburgh MDT") +
  theme(axis.title.x = element_blank(), axis.title.y = element_blank())

p2 <- test_scores_summary %>% 
  mutate(
    n = str_glue("{n}"),
    mean = str_glue("{round(mean, digits = 1)}"),
    sd = str_glue("{round(sd, digits = 1)}"),
    median = str_glue("{median}")
  ) %>% 
  pivot_longer(c(n, mean, sd, median), names_to = "layer", values_to = "label") %>% 
  mutate(layer = fct_relevel(layer, c("n", "sd", "mean", "median")) %>% fct_rev()) %>% 
  ggplot(aes(x = 80, y = layer, label = label)) +
  geom_text(size = 10 * 5/14, hjust = 1) +
  scale_x_continuous(limits = c(0,100)) +
  facet_grid(cols = vars(year)) +
  labs(y = "", x = NULL) +
  scale_y_discrete(labels = c("n" = "N", "mean" = "Mean", "median" = "Median")) +
  theme_minimal() +
  theme(axis.line = element_blank(), axis.ticks = element_blank(), axis.text.x = element_blank(),
        panel.grid = element_blank(), strip.text = element_blank())

p1 / p2 +  plot_layout(heights = c(5, 2.5))

ggsave("output/uoe_pre_data-summary.pdf", units = "cm", width = 12, height = 8)

Mean and standard deviation for each item:

test_scores %>% 
  select(-AnonID, -Total) %>% 
  group_by(year) %>% 
  skim_without_charts() %>% 
  select(-contains("character."), -contains("numeric.p"), -skim_type) %>% 
  rename(complete = complete_rate) %>% 
  # make the table wider, i.e. with separate columns for each year's results, with the year at the start of each column name
  pivot_wider(names_from = year, values_from = -c(skim_variable, year), names_glue = "{year}__{.value}") %>% 
  # put the columns in order by year
  select(sort(names(.))) %>% 
  select(skim_variable, everything()) %>% 
  # use GT to make the table look nice
  gt(rowname_col = "skim_variable") %>% 
  # group the columns from each year
  tab_spanner_delim(delim = "__") %>%
  fmt_number(columns = contains("numeric"), decimals = 2) %>%
  fmt_percent(columns = contains("complete"), decimals = 0) %>% 
  # change all the numeric.mean and numeric.sd column names to Mean and SD
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.mean"), label = "Mean") %>% deframe()
  ) %>% 
  cols_label(
    .list = test_scores %>% select(year) %>% distinct() %>% transmute(col = paste0(year, "__numeric.sd"), label = "SD") %>% deframe()
  ) %>%
  data_color(
    columns = contains("numeric.mean"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  )
2013-14 2014-15 2015-16 2016-17
complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD complete n_missing Mean SD
A1 100% 0 4.32 1.41 100% 0 4.40 1.32 100% 0 4.45 1.29 100% 0 4.52 1.16
A2 100% 0 4.31 1.65 100% 0 4.41 1.53 100% 0 4.49 1.45 100% 0 4.54 1.38
A3 100% 0 3.33 2.36 100% 0 3.25 2.39 100% 0 3.39 2.34 100% 0 3.61 2.24
A4 100% 0 3.72 1.43 100% 0 3.83 1.33 100% 0 3.91 1.38 100% 0 4.12 1.22
A5 100% 0 4.05 1.96 100% 0 4.19 1.84 100% 0 4.28 1.76 100% 0 4.32 1.71
A6 100% 0 1.48 1.85 100% 0 1.53 1.85 100% 0 1.82 1.96 100% 0 2.03 1.97
A7 100% 0 3.58 2.17 100% 0 3.47 2.22 100% 0 3.70 2.12 100% 0 4.14 1.84
A8 100% 0 2.97 2.46 100% 0 3.04 2.44 100% 0 3.33 2.36 100% 0 3.70 2.19
A9 100% 0 4.31 1.73 100% 0 4.46 1.55 100% 0 4.45 1.57 100% 0 4.65 1.27
A10 100% 0 3.19 2.03 100% 0 3.29 2.00 100% 0 3.34 1.97 100% 0 3.62 1.89
A11 100% 0 4.05 1.45 100% 0 4.16 1.40 100% 0 4.13 1.42 100% 0 4.25 1.35
A12 100% 0 3.99 1.90 100% 0 3.93 1.95 100% 0 4.11 1.81 100% 0 4.29 1.65
A13 100% 0 3.69 1.95 100% 0 3.68 1.94 100% 0 3.83 1.81 100% 0 4.06 1.71
A14 100% 0 2.44 1.93 100% 0 2.54 1.93 100% 0 2.85 1.92 100% 0 3.03 1.90
A15 100% 0 3.57 2.00 100% 0 3.51 2.02 100% 0 3.67 1.95 100% 0 3.68 2.02
A16 100% 0 4.51 1.49 100% 0 4.49 1.52 100% 0 4.55 1.43 100% 0 4.66 1.26
A17 100% 0 4.69 1.21 100% 0 4.59 1.37 100% 0 4.74 1.12 100% 0 4.75 1.09
A18 100% 0 3.48 2.30 100% 0 3.45 2.31 100% 0 3.65 2.22 100% 0 3.83 2.12
A19 100% 0 2.28 2.49 100% 0 2.26 2.49 100% 0 2.61 2.50 100% 0 2.99 2.45
A20 100% 0 1.83 2.41 100% 0 1.82 2.41 100% 0 2.00 2.45 100% 0 2.46 2.50

2 Testing assumptions

Before applying IRT, we should check that the data satisfies the assumptions needed by the model. In particular, to use a 1-dimensional IRT model, we should have some evidence of unidimensionality in the test scores.

2.1 Inter-item correlations

If the test is unidimensional then we would expect student scores on pairs of items to be correlated.

This plot shows the correlations between scores on each pair of items:

item_scores <- test_scores %>% 
  select(starts_with("A"), -AnonID)

cor_ci <- psych::corCi(item_scores, plot = FALSE)

psych::cor.plot.upperLowerCi(cor_ci)

Checking for correlations that are not significantly different from 0, there are none:

cor_ci$ci %>% 
  as_tibble(rownames = "corr") %>% 
  filter(p > 0.05) %>% 
  arrange(-p) %>% 
  select(-contains(".e")) %>% 
  gt() %>% 
  fmt_number(columns = 2:4, decimals = 3)
corr lower upper p
A3-A8 −0.004 0.067 0.087
A1-A17 −0.004 0.082 0.076
A2-A17 0.000 0.083 0.051

The overall picture is that the item scores are well correlated with each other.

2.2 Dimensionality

structure <- check_factorstructure(item_scores)
n <- n_factors(item_scores)

2.2.1 Is the data suitable for Factor Analysis?

  • KMO: The Kaiser, Meyer, Olkin (KMO) measure of sampling adequacy suggests that data seems appropriate for factor analysis (KMO = 0.89).
  • Sphericity: Bartlett’s test of sphericity suggests that there is sufficient significant correlation in the data for factor analysis (Chisq(190) = 7881.11, p < .001).

2.2.2 Method Agreement Procedure:

The choice of 1 dimensions is supported by 6 (31.58%) methods out of 19 (t, p, Acceleration factor, Scree (R2), VSS complexity 1, Velicer’s MAP).

plot(n)

summary(n) %>% gt()
n_Factors n_Methods
1 6
2 2
3 2
4 5
12 1
19 3
#n %>% tibble() %>% gt()
fa.parallel(item_scores, fa = "fa")

## Parallel analysis suggests that the number of factors =  5  and the number of components =  NA

2.2.3 1 Factor

We use the factanal function to fit a 1-factor model.

Note that this function cannot handle missing data, so any NA scores must be set to 0 for this analysis.

fitfact <- factanal(item_scores,
                    factors = 1,
                    rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores, factors = 1, rotation = "varimax")
## 
## Uniquenesses:
##   A1   A2   A3   A4   A5   A6   A7   A8   A9  A10  A11  A12  A13  A14  A15  A16 
## 0.90 0.96 0.83 0.73 0.89 0.84 0.80 0.94 0.89 0.83 0.92 0.84 0.81 0.63 0.89 0.85 
##  A17  A18  A19  A20 
## 0.90 0.85 0.74 0.68 
## 
## Loadings:
##  [1] 0.52 0.61 0.51 0.57 0.31      0.41 0.33 0.40 0.45      0.34 0.41      0.40
## [16] 0.44 0.33 0.39 0.32 0.39
## 
##                Factor1
## SS loadings       3.29
## Proportion Var    0.16
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 1276.9 on 170 degrees of freedom.
## The p-value is 7.86e-169
load <- tidy(fitfact)

load %>% 
  select(question = variable, factor_loading = fl1) %>% 
  left_join(item_info, by = "question") %>% 
  ggplot(aes(x = factor_loading, y = 0, colour = MATH_group)) + 
    geom_point() + 
    geom_label_repel(aes(label = question), show.legend = FALSE) +
    scale_colour_manual("MATH group", values = MATH_colours) +
  scale_y_discrete() +
    labs(x = "Factor 1", y = NULL,
         title = "Standardised Loadings", 
         subtitle = "Based on 1-factor solution") +
    theme_minimal()
## Warning: ggrepel: 2 unlabeled data points (too many overlaps). Consider
## increasing max.overlaps

load %>% 
  select(question = variable, factor_loading = fl1) %>% 
  left_join(item_info, by = "question") %>% 
  arrange(-factor_loading) %>% 
  gt() %>%
  data_color(
    columns = contains("factor"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("MATH"),
    colors = MATH_colours
  )
question factor_loading description MATH_group
A14 0.6113509 find minimum gradient of cubic B
A20 0.5654389 product rule with given values B
A4 0.5179007 completing the square A
A19 0.5095020 area between curve and x-axis (in 2 parts) B
A7 0.4479689 graphical vector sum B
A13 0.4363393 equation of tangent A
A3 0.4144852 composition of functions B
A10 0.4129131 identify graph of rational functions B
A6 0.4005742 trig wave function A
A12 0.3980326 find point with given slope A
A16 0.3873155 trig chain rule A
A18 0.3870384 definite integral A
A9 0.3370688 simplify logs A
A15 0.3346979 find and classify stationary points of cubic A
A5 0.3339280 trig double angle formula A
A17 0.3235231 chain rule A
A1 0.3090752 properties of fractions A
A11 0.2909308 summing arithmetic progression A
A8 0.2348027 compute angle between 3d vectors A
A2 0.2082143 find intersection of lines A

It is striking here that the MATH Group B questions are those that load most strongly onto this factor.

2.2.4 4 Factor

Here we also investigate the 4-factor solution, to see whether these factors are interpretable.

fitfact2 <- factanal(item_scores, factors = 4, rotation = "varimax")
print(fitfact2, digits = 2, cutoff = 0.3, sort = TRUE)
## 
## Call:
## factanal(x = item_scores, factors = 4, rotation = "varimax")
## 
## Uniquenesses:
##   A1   A2   A3   A4   A5   A6   A7   A8   A9  A10  A11  A12  A13  A14  A15  A16 
## 0.88 0.95 0.70 0.69 0.86 0.75 0.77 0.81 0.87 0.80 0.90 0.81 0.69 0.58 0.87 0.59 
##  A17  A18  A19  A20 
## 0.45 0.83 0.75 0.66 
## 
## Loadings:
##     Factor1 Factor2 Factor3 Factor4
## A3  0.54                           
## A16         0.60                   
## A17         0.73                   
## A13                 0.50           
## A1  0.33                           
## A2                                 
## A4  0.48                           
## A5                                 
## A6                          0.42   
## A7  0.37                           
## A8                          0.42   
## A9                                 
## A10 0.40                           
## A11                                
## A12                 0.32           
## A14 0.41            0.46           
## A15                                
## A18                                
## A19 0.37                           
## A20 0.48                           
## 
##                Factor1 Factor2 Factor3 Factor4
## SS loadings       1.82    1.14    1.01    0.80
## Proportion Var    0.09    0.06    0.05    0.04
## Cumulative Var    0.09    0.15    0.20    0.24
## 
## Test of the hypothesis that 4 factors are sufficient.
## The chi square statistic is 196.75 on 116 degrees of freedom.
## The p-value is 4.23e-06
load2 <- tidy(fitfact2)

load2_plot <- load2 %>%
  rename(question = variable) %>% 
  left_join(item_info, by = "question") %>%
  ggplot(aes(x = fl1, y = fl2, colour = MATH_group, shape = MATH_group)) +
  geom_point() +
  geom_text_repel(aes(label = question), show.legend = FALSE, alpha = 0.6) +
  labs(
    x = "Factor 1 (of 4)",
    y = "Factor 2 (of 4)"
  ) +
  scale_colour_manual("MATH group", values = MATH_colours[1:2]) +
  scale_shape_manual(name = "MATH group", values = c(19, 17)) +
  theme_minimal()

load2_plot +
  labs(
    title = "Standardised Loadings",
    subtitle = "Showing the first 2 factors of the 4-factor model"
  )

ggsave("output/uoe_pre_4factor.pdf", units = "cm", width = 14, height = 10, dpi = 300,
       plot = load2_plot)
main_factors <- load2 %>% 
#  mutate(factorNone = 0.4) %>%  # add this to set the main factor to "None" where all loadings are below 0.4
  pivot_longer(names_to = "factor",
               cols = contains("fl")) %>% 
  mutate(value_abs = abs(value)) %>% 
  group_by(variable) %>% 
  top_n(1, value_abs) %>% 
  ungroup() %>% 
  transmute(main_factor = factor, variable)


load2 %>% 
  select(-uniqueness) %>% 
  # add the info about which is the main factor
  left_join(main_factors, by = "variable") %>%
  left_join(item_info %>% select(variable = question, description, MATH_group), by = "variable") %>% 
  arrange(main_factor) %>% 
  select(main_factor, everything()) %>% 
  # arrange adjectives by descending loading on main factor
  rowwise() %>% 
  mutate(max_loading = max(abs(c_across(starts_with("fl"))))) %>% 
  group_by(main_factor) %>% 
  arrange(-max_loading, .by_group = TRUE) %>% 
  select(-max_loading) %>% 
  # sort out the presentation
  rename("Main Factor" = main_factor,
         "Question" = variable) %>%
  mutate_at(
    vars(starts_with("fl")),
    ~ cell_spec(round(., digits = 3), bold = if_else(abs(.) > 0.4, T, F))
  ) %>% 
  kable(booktabs = T, escape = F, longtable = T) %>% 
  kableExtra::collapse_rows(columns = 1, valign = "top") %>%
  kableExtra::kable_styling(latex_options = c("repeat_header"))
Main Factor Question fl1 fl2 fl3 fl4 description MATH_group
fl1 A3 0.539 0.035 0.058 0.031 composition of functions B
fl1 A20 0.483 0.144 0.258 0.13 product rule with given values B
fl1 A4 0.483 0.049 0.185 0.197 completing the square A
fl1 A10 0.399 0.086 0.159 0.066 identify graph of rational functions B
fl1 A7 0.373 0.013 0.248 0.16 graphical vector sum B
fl1 A19 0.37 0.177 0.244 0.163 area between curve and x-axis (in 2 parts) B
fl1 A1 0.331 0.025 0.084 0.079 properties of fractions A
fl1 A9 0.247 0.151 0.072 0.202 simplify logs A
fl1 A2 0.166 0.03 0.074 0.114 find intersection of lines A
fl2 A17 0.028 0.727 0.126 0.056 chain rule A
fl2 A16 0.108 0.598 0.135 0.16 trig chain rule A
fl2 A18 0.191 0.244 0.162 0.224 definite integral A
fl3 A13 0.171 0.127 0.504 0.083 equation of tangent A
fl3 A14 0.406 0.122 0.46 0.162 find minimum gradient of cubic B
fl3 A12 0.173 0.103 0.323 0.207 find point with given slope A
fl3 A15 0.127 0.179 0.261 0.135 find and classify stationary points of cubic A
fl4 A8 0.022 0.062 0.083 0.419 compute angle between 3d vectors A
fl4 A6 0.195 0.074 0.171 0.415 trig wave function A
fl4 A5 0.24 0.086 0.08 0.254 trig double angle formula A
fl4 A11 0.21 0.113 0.045 0.212 summing arithmetic progression A

The first factor is dominated by questions that had previously been identified as MATH Group B, i.e. those that are somehow “non-standard” – either requiring students to recognise that a particular rule/procedure is applicable before applying it, or to apply it in an unusual way. This factor also includes Group A questions on “pre-calculus” topics (such as fractions, logarithms and trigonometry) that students had perhaps not explicitly practiced most recently.

The second factor is dominated by the two chain rule questions (A16 and A17), along with A18 which is a routine definite integral, suggesting this factor is related to routine calculus computations.

The third factor seems to be based on applying calculus techniques to cubic and quadratic curves, e.g. to find tangent lines or stationary points.

The fourth factor is dominated by the only two questions that require the use of a calculator (to compute trigonometric functions), but more generally seems to be based on non-calculus skills (vectors, trig, sequences).

3 Fitting IRT model

The mirt implementation of the graded partial credit model (gpmc) requires that the partial marks are consecutive integers. We therefore need to work around this by adjusting our scores into that form (e.g. replacing scores of 0, 2.5, 5 with 1, 2, 3), while keeping track of the true scores attached to each mark level so that we can properly compute expected scores later on.

# Determine the mark levels for each item
mark_levels <- item_scores %>% 
  pivot_longer(everything(), names_to = "item", values_to = "score") %>% 
  distinct() %>% 
  arrange(parse_number(item), score) %>% 
  group_by(item) %>%
  mutate(order = row_number()) %>% 
# Note that the convention used by mirt is for items that have only 2 levels (i.e. 0 marks or full marks),
# the columns are P.0 and P.1, while other items are indexed from 1, i.e. P.1, P.2, ...
# https://github.com/philchalmers/mirt/blob/accd2383b9a4d17a4cab269717ce98434900b62c/R/probtrace.R#L57
  mutate(level = case_when(
    max(order) == 2 ~ order - 1,
    TRUE ~ order * 1.0
  )) %>% 
  mutate(levelname = paste0(item, ".P.", level))

# Use the mark_levels table to replace scores with levels
# (first pivot the data to long form, make the replacement, then pivot back to wide again)
item_scores_levelled <- item_scores %>% 
  # temporarily add row identifiers
  mutate(row = row_number()) %>% 
  pivot_longer(cols = -row, names_to = "item", values_to = "score") %>% 
  left_join(mark_levels %>% select(item, score, level), by = c("item", "score")) %>% 
  select(-score) %>% 
  pivot_wider(names_from = "item", values_from = "level") %>% 
  select(-row)
Show model fitting output
fit_gpcm <- mirt(
  data = item_scores_levelled, # just the columns with question scores
  model = 1,          # number of factors to extract
  itemtype = "gpcm",  # generalised partial credit model
  SE = TRUE           # estimate standard errors
  )
## 
Iteration: 1, Log-Lik: -50895.842, Max-Change: 6.51151
Iteration: 2, Log-Lik: -47143.856, Max-Change: 1.39679
Iteration: 3, Log-Lik: -46111.495, Max-Change: 3.70482
Iteration: 4, Log-Lik: -45840.485, Max-Change: 0.91195
Iteration: 5, Log-Lik: -45689.338, Max-Change: 1.04099
Iteration: 6, Log-Lik: -45627.116, Max-Change: 0.33955
Iteration: 7, Log-Lik: -45599.220, Max-Change: 0.16658
Iteration: 8, Log-Lik: -45588.708, Max-Change: 0.05793
Iteration: 9, Log-Lik: -45583.407, Max-Change: 0.06734
Iteration: 10, Log-Lik: -45580.515, Max-Change: 0.04218
Iteration: 11, Log-Lik: -45578.761, Max-Change: 0.02592
Iteration: 12, Log-Lik: -45577.647, Max-Change: 0.02897
Iteration: 13, Log-Lik: -45576.671, Max-Change: 0.01668
Iteration: 14, Log-Lik: -45576.152, Max-Change: 0.00766
Iteration: 15, Log-Lik: -45575.863, Max-Change: 0.00568
Iteration: 16, Log-Lik: -45575.430, Max-Change: 0.00230
Iteration: 17, Log-Lik: -45575.403, Max-Change: 0.00567
Iteration: 18, Log-Lik: -45575.379, Max-Change: 0.00174
Iteration: 19, Log-Lik: -45575.374, Max-Change: 0.00164
Iteration: 20, Log-Lik: -45575.363, Max-Change: 0.00188
Iteration: 21, Log-Lik: -45575.354, Max-Change: 0.00058
Iteration: 22, Log-Lik: -45575.353, Max-Change: 0.00049
Iteration: 23, Log-Lik: -45575.352, Max-Change: 0.00118
Iteration: 24, Log-Lik: -45575.346, Max-Change: 0.00065
Iteration: 25, Log-Lik: -45575.344, Max-Change: 0.00017
Iteration: 26, Log-Lik: -45575.344, Max-Change: 0.00016
Iteration: 27, Log-Lik: -45575.343, Max-Change: 0.00009
## 
## Calculating information matrix...

3.1 Local independence

We compute Yen’s \(Q_3\) (1984) to check for any dependence between items after controlling for \(\theta\). This gives a score for each pair of items, with scores above 0.2 regarded as problematic (see DeMars, p. 48).

residuals  %>% as.matrix() %>% 
  corrplot::corrplot(type = "upper")

This shows that most item pairs are independent, with only one pair showing cause for concern:

residuals %>%
  rownames_to_column(var = "item1") %>%
  as_tibble() %>% 
  pivot_longer(cols = starts_with("A"), names_to = "item2", values_to = "Q3_score") %>% 
  filter(abs(Q3_score) > 0.2) %>% 
  filter(parse_number(item1) < parse_number(item2)) %>%
  gt()
item1 item2 Q3_score
A16 A17 0.323394

Items A16 and A17 are on the chain rule (e.g. differentiating \(\cos(4x^2+5)\) and \((3x^2-8)^3\) respectively), so it is perhaps unsurprising that students’ performance on these items is not entirely independent.

Given that this violation of the local independence assumption is very mild, we proceed using this model.

3.2 Model parameters

We augment the data with estimated abilities for each student, using mirt’s fscores() function.

test_scores_with_ability <- test_scores %>%
  mutate(F1 = fscores(fit_gpcm, method = "EAP"))

Next, we extract the IRT parameters.

coefs_gpcm <- coef(fit_gpcm, IRTpars = TRUE)

We use the tidy_mirt_coeffs function to get all the parameter estimates into a tidy table:

source("fn_tidy_mirt_coefs.R")

tidy_gpcm <- tidy_mirt_coefs(coefs_gpcm)
tidy_gpcm %>% 
  filter(par == "a") %>% 
  select(-par) %>% 
  rename_with(.fn = ~ paste0("a_", .x), .cols = -Question) %>% 
  left_join(
    tidy_gpcm %>% 
      filter(str_detect(par, "^b")),
    by = "Question"
  ) %>% 
  gt(groupname_col = "Question") %>%
  fmt_number(columns = contains("est|_"), decimals = 3) %>%
  data_color(
    columns = contains("a_"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = c("est", "CI_2.5", "CI_97.5"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  ) %>%
  tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
  tab_spanner(label = "Difficulty", columns = c("par", "est", "CI_2.5", "CI_97.5"))
Discrimination Difficulty
a_est a_CI_2.5 a_CI_97.5 par est CI_2.5 CI_97.5
A1
0.6739273 0.5766091 0.7712455 b1 -2.37887310 -2.70677829 -2.05096790
0.6739273 0.5766091 0.7712455 b2 -2.77145860 -3.15954563 -2.38337157
A2
0.3663307 0.2934319 0.4392295 b1 2.05108376 1.24385401 2.85831351
0.3663307 0.2934319 0.4392295 b2 -8.62067669 -10.37049167 -6.87086170
A3
1.0870453 0.9675230 1.2065675 b1 -0.84539828 -0.94886622 -0.74193034
A4
0.2734782 0.2460363 0.3009201 b1 11.73622013 7.30832377 16.16411649
0.2734782 0.2460363 0.3009201 b2 -14.41982341 -18.82895390 -10.01069291
0.2734782 0.2460363 0.3009201 b3 3.21551287 1.76923276 4.66179298
0.2734782 0.2460363 0.3009201 b4 -7.39155856 -8.82579336 -5.95732375
0.2734782 0.2460363 0.3009201 b5 2.20536928 1.28228743 3.12845113
0.2734782 0.2460363 0.3009201 b6 -4.92286032 -5.84277186 -4.00294877
0.2734782 0.2460363 0.3009201 b7 1.56215355 0.89309330 2.23121379
0.2734782 0.2460363 0.3009201 b8 -3.60160285 -4.28569144 -2.91751425
0.2734782 0.2460363 0.3009201 b9 6.27564656 5.27623198 7.27506114
0.2734782 0.2460363 0.3009201 b10 -9.54842246 -10.80070414 -8.29614078
A5
1.0451735 0.9039862 1.1863607 b1 -1.90850902 -2.11948357 -1.69753446
A6
0.4263958 0.3782147 0.4745769 b1 0.75043015 0.52648747 0.97437283
0.4263958 0.3782147 0.4745769 b2 9.20247071 7.79017178 10.61476964
0.4263958 0.3782147 0.4745769 b3 -7.47434964 -8.85132818 -6.09737111
A7
0.6618140 0.5902292 0.7333989 b1 1.71629156 1.34094390 2.09163921
0.6618140 0.5902292 0.7333989 b2 -3.93569909 -4.42025082 -3.45114736
A8
0.5232691 0.4326544 0.6138839 b1 -1.24674100 -1.48771035 -1.00577165
A9
1.2731825 1.0950872 1.4512779 b1 -2.08927226 -2.30486362 -1.87368089
A10
0.6375293 0.5675827 0.7074758 b1 -0.83165992 -1.00159912 -0.66172072
0.6375293 0.5675827 0.7074758 b2 -1.09423875 -1.28487792 -0.90359958
A11
0.2874306 0.2455924 0.3292689 b1 -1.83148685 -2.73156569 -0.93140801
0.2874306 0.2455924 0.3292689 b2 -5.93170797 -6.91126447 -4.95215147
0.2874306 0.2455924 0.3292689 b3 9.59073848 7.75997286 11.42150409
0.2874306 0.2455924 0.3292689 b4 -14.01994221 -16.34217642 -11.69770801
A12
0.4469893 0.3938993 0.5000792 b1 0.92339830 0.47109722 1.37569938
0.4469893 0.3938993 0.5000792 b2 0.82752184 0.28810586 1.36693783
0.4469893 0.3938993 0.5000792 b3 -6.81714207 -7.72532434 -5.90895981
A13
0.4498875 0.4002035 0.4995716 b1 -0.92305819 -1.21251257 -0.63360381
0.4498875 0.4002035 0.4995716 b2 4.44772263 3.63907005 5.25637520
0.4498875 0.4002035 0.4995716 b3 -7.94643800 -9.00376537 -6.88911062
A14
0.5058676 0.4592957 0.5524394 b1 1.59690368 1.20277845 1.99102890
0.5058676 0.4592957 0.5524394 b2 -3.78725933 -4.21495258 -3.35956608
0.5058676 0.4592957 0.5524394 b3 8.73895686 7.38014808 10.09776565
0.5058676 0.4592957 0.5524394 b4 -3.77332206 -4.98361643 -2.56302770
0.5058676 0.4592957 0.5524394 b5 -4.33839831 -4.96075350 -3.71604312
A15
0.2315387 0.2010493 0.2620280 b1 8.34058532 6.77169763 9.90947301
0.2315387 0.2010493 0.2620280 b2 -6.58343508 -7.89985746 -5.26701269
0.2315387 0.2010493 0.2620280 b3 3.11283156 2.20383750 4.02182563
0.2315387 0.2010493 0.2620280 b4 -10.63256038 -12.21589986 -9.04922090
A16
1.7636035 1.5204396 2.0067673 b1 -1.87625943 -2.03473886 -1.71778000
A17
1.6549288 1.3943388 1.9155188 b1 -2.22363608 -2.44350649 -2.00376566
A18
0.9999717 0.8822981 1.1176452 b1 -1.12920867 -1.25973364 -0.99868370
A19
1.3764839 1.2429462 1.5100216 b1 -0.02088671 -0.08918441 0.04741098
A20
1.8142982 1.6389921 1.9896043 b1 0.33216145 0.26971198 0.39461092
tidy_gpcm %>% 
  write_csv("output/uoe_pre_gpcm-results.csv")

3.3 Information curves

theta <- seq(-6, 6, by=0.05)

info_matrix <- testinfo(fit_gpcm, theta, individual = TRUE)
colnames(info_matrix) <- item_info %>% pull(question)
item_info_data <- info_matrix %>% 
  as_tibble() %>% 
  bind_cols(theta = theta) %>% 
  pivot_longer(cols = -theta, names_to = "item", values_to = "info_y") %>% 
  left_join(item_info %>% select(item = question, MATH_group), by = "item") %>% 
  mutate(item = fct_reorder(item, parse_number(item)))

3.3.1 Test information curve

item_info_data %>% 
  group_by(theta) %>% 
  summarise(info_y = sum(info_y)) %>% 
  ggplot(aes(x = theta, y = info_y)) +
  geom_line() +
  labs(x = "Ability", y = "Information", title = "Edinburgh MDT") +
  theme_minimal()

ggsave("output/uoe_pre_info.pdf", width = 10, height = 6, units = "cm")

This shows that the information given by the test is skewed toward the lower end of the ability scale - i.e. it can give more accurate estimates of students’ ability where their ability level is slightly below the mean.

3.3.2 Item information curves

Breaking this down by question helps to highlight those questions that are most/least informative:

item_info_data %>% 
  ggplot(aes(x = theta, y = info_y, colour = item)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Information") +
  theme_minimal()

We can also compute the sums of different subsets of the information curves – here, we will look at the questions based on their MATH group:

item_info_data %>% 
  group_by(theta) %>% 
  summarise(
    items_all = sum(info_y),
    items_A = sum(ifelse(MATH_group == "A", info_y, 0)),
    items_B = sum(ifelse(MATH_group == "B", info_y, 0))
  ) %>% 
  pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>% 
  mutate(items = fct_relevel(items, "all", "A", "B")) %>% 
  ggplot(aes(x = theta, y = info_y, colour = items)) +
  geom_line() +
  scale_colour_manual(values = c("all" = "#000000", MATH_colours)) +
  labs(x = "Ability", y = "Information") +
  theme_minimal()

ggsave("output/uoe_pre_info-curves_A-vs-B.pdf", units = "cm", width = 14, height = 6)

This shows that the information in the MATH Group B questions is at a higher point on the ability scale than for the MATH Group A questions.

Since the number of items in each case is different, we consider instead the mean information per item:

item_info_data %>% 
  group_by(theta) %>% 
  summarise(
    items_all = sum(info_y) / n(),
    items_A = sum(ifelse(MATH_group == "A", info_y, 0)) / sum(ifelse(MATH_group == "A", 1, 0)),
    items_B = sum(ifelse(MATH_group == "B", info_y, 0)) / sum(ifelse(MATH_group == "B", 1, 0))
  ) %>% 
  pivot_longer(cols = starts_with("items_"), names_to = "items", names_prefix = "items_", values_to = "info_y") %>% 
  mutate(items = fct_relevel(items, "all", "A", "B")) %>% 
  ggplot(aes(x = theta, y = info_y, colour = items)) +
  geom_line() +
  scale_colour_manual(values = c("all" = "#000000", MATH_colours)) +
  labs(x = "Ability", y = "Mean information per item") +
  theme_minimal()

ggsave("output/uoe_pre_info-curves_A-vs-B-avg.pdf", units = "cm", width = 10, height = 6)

This shows that items of each MATH group are giving broadly similar levels of information on average, but at different points on the ability scale.

3.4 Total information

Using mirt’s areainfo function, we can find the total area under the information curves.

info_gpcm <- areainfo(fit_gpcm, c(-4,4))
info_gpcm %>% gt()
LowerBound UpperBound Info TotalInfo Proportion nitems
-4 4 24.91412 27.42629 0.9084027 20

This shows that the total information in all items is 27.426295.

3.4.1 Information by item

tidy_info <- item_info %>%
  mutate(item_num = row_number()) %>% 
  mutate(TotalInfo = purrr::map_dbl(
    item_num,
    ~ areainfo(fit_gpcm,
               c(-4, 4),
               which.items = .x) %>% pull(TotalInfo)
  ))

tidy_info %>%
  select(-item_num) %>% 
  arrange(-TotalInfo) %>% 
  #group_by(outcome) %>% 
  gt() %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("info"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("outcome"),
    colors = scales::col_factor(palette = c("viridis"), domain = NULL)
  ) %>%
  cols_label(
    TotalInfo = "Information"
  )
question description MATH_group Information
A4 completing the square A 2.7267354
A14 find minimum gradient of cubic B 2.5274266
A20 product rule with given values B 1.8142981
A16 trig chain rule A 1.7636024
A17 chain rule A 1.6549245
A19 area between curve and x-axis (in 2 parts) B 1.3764810
A1 properties of fractions A 1.3437311
A13 equation of tangent A 1.3420102
A12 find point with given slope A 1.3372831
A7 graphical vector sum B 1.3232681
A6 trig wave function A 1.2742625
A9 simplify logs A 1.2731285
A10 identify graph of rational functions B 1.2726651
A11 summing arithmetic progression A 1.1089275
A3 composition of functions B 1.0869852
A5 trig double angle formula A 1.0449474
A18 definite integral A 0.9998165
A15 find and classify stationary points of cubic A 0.9166682
A2 find intersection of lines A 0.7226252
A8 compute angle between 3d vectors A 0.5165082

Restricting instead to the range \(-2\leq\theta\leq2\):

tidy_info <- item_info %>%
  mutate(item_num = row_number()) %>% 
  mutate(TotalInfo = purrr::map_dbl(
    item_num,
    ~ areainfo(fit_gpcm,
               c(-2, 2),
               which.items = .x) %>% pull(Info)
  ))

tidy_info %>%
  select(-item_num) %>% 
  arrange(-TotalInfo) %>% 
  #group_by(outcome) %>% 
  gt() %>% 
  fmt_number(columns = contains("a_"), decimals = 2) %>%
  fmt_number(columns = contains("b_"), decimals = 2) %>%
  data_color(
    columns = contains("info"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("outcome"),
    colors = scales::col_factor(palette = c("viridis"), domain = NULL)
  ) %>%
  cols_label(
    TotalInfo = "Information"
  )
question description MATH_group Information
A14 find minimum gradient of cubic B 2.0688682
A20 product rule with given values B 1.7043642
A4 completing the square A 1.5795642
A19 area between curve and x-axis (in 2 parts) B 1.2114717
A16 trig chain rule A 0.9757468
A7 graphical vector sum B 0.9572318
A13 equation of tangent A 0.8503176
A10 identify graph of rational functions B 0.8046603
A3 composition of functions B 0.7987483
A12 find point with given slope A 0.7787257
A6 trig wave function A 0.7728878
A17 chain rule A 0.6745425
A18 definite integral A 0.6629645
A9 simplify logs A 0.5935119
A5 trig double angle formula A 0.5302626
A15 find and classify stationary points of cubic A 0.4802470
A1 properties of fractions A 0.4683832
A11 summing arithmetic progression A 0.3844167
A8 compute angle between 3d vectors A 0.2316390
A2 find intersection of lines A 0.1933876

3.5 Response curves

Since the gpcm model is more complicated, there is a characteristic curve for each possible score on the question:

trace_data <- probtrace(fit_gpcm, theta) %>% 
  as_tibble() %>% 
  bind_cols(theta = theta) %>% 
  pivot_longer(cols = -theta, names_to = "level", values_to = "y") %>% 
  left_join(mark_levels %>% select(item, level = levelname, score), by = "level") %>% 
  mutate(score = as.factor(score))

trace_data %>% 
  mutate(item = fct_reorder(item, parse_number(item))) %>% 
  ggplot(aes(x = theta, y = y, colour = score)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Probability of response") +
  theme_minimal()

To get a simplified picture for each question, we compute the expected score at each ability level:

expected_scores <- trace_data %>% 
  mutate(item = fct_reorder(item, parse_number(item))) %>% 
  group_by(item, theta) %>% 
  summarise(expected_score = sum(as.double(as.character(score)) * y), .groups = "drop")

expected_scores %>% 
  ggplot(aes(x = theta, y = expected_score, colour = item)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  facet_wrap(vars(item)) +
  labs(y = "Expected score") +
  theme_minimal()

The resulting curves look quite similar to those from the 2PL, allowing for some similar interpretation. For instance, superimposing all the curves shows that there is a spread of difficulties (i.e. thetas where the expected score is 2.5/5) and that some questions are more discriminating than others (i.e. steeper slopes):

plt <- expected_scores %>% 
  ggplot(aes(x = theta, y = expected_score, colour = item, text = item)) +
  geom_line() +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  labs(y = "Expected score") +
  theme_minimal()

ggplotly(plt, tooltip = "text")
ggsave(plot = plt, file = "output/uoe_pre_iccs-superimposed.pdf", width = 20, height = 14, units = "cm")
highlight_removed_items <- expected_scores %>% 
  mutate(highlight_item = item %in% c("A2", "A8", "A11")) %>% 
  mutate(line_width = ifelse(highlight_item, 1, 0.5))

highlight_removed_items %>% 
  ggplot(aes(x = theta, y = expected_score, colour = item, text = item, alpha = highlight_item)) +
  geom_line(aes(size = highlight_item)) +
  #geom_point(data = highlight_removed_items %>% filter(highlight_item == TRUE, theta == 0)) +
  ggrepel::geom_label_repel(
    data = highlight_removed_items %>% filter(highlight_item == TRUE, theta == 0),
    aes(label = item),
    box.padding = 0,
    show.legend = FALSE
  ) +
  scale_colour_viridis_d("Question", option = "plasma", end = 0.8, direction = -1) +
  scale_size_manual(values = c("FALSE" = 0.6, "TRUE" = 0.9), guide = "none") +
  scale_alpha_discrete(guide = "none", range = c(0.2, 1)) +
  labs(x = "Ability", y = "Expected score") +
  theme_minimal() +
  theme(legend.position="bottom",#legend.title=element_blank(),
      legend.margin = margin(0, 0, 0, 0),
      legend.spacing.x = unit(1, "mm"),
      legend.spacing.y = unit(0, "mm")) +
  guides(colour = guide_legend(nrow = 2))
## Warning: Using alpha for a discrete variable is not advised.

ggsave(file = "output/uoe_pre_iccs-highlight.pdf", width = 16, height = 10, units = "cm")

3.5.1 Test response curve

total_expected_score <- expected_scores %>% 
  group_by(theta) %>% 
  summarise(expected_score = sum(expected_score))

total_expected_score %>% 
  ggplot(aes(x = theta, y = expected_score)) +
  geom_line() +
  # geom_point(data = total_expected_score %>% filter(theta == 0)) +
  # ggrepel::geom_label_repel(data = total_expected_score %>% filter(theta == 0), aes(label = round(expected_score, 1)), box.padding = 0.5) +
  scale_colour_viridis_d(option = "plasma", end = 0.8, direction = -1) +
  labs(y = "Expected score") +
  theme_minimal()

4 Predictive validity

course_results <- read_csv("data-uoe/ANON_2013-2017_course-results.csv", col_types = "ccddddddddd")

course_results_long <- course_results %>% 
  pivot_longer(cols = !c(AnonID, year), names_to = "course", values_to = "mark") %>% 
  filter(!is.na(mark)) %>% 
  separate(course, into = c("course_type", "course"), sep = "_") %>% 
  mutate(year = str_replace(year, "/", "-1"))

course_results_vs_diagtest <- course_results_long %>% 
  left_join(test_scores_with_ability %>% select(AnonID, year, diagtest_score = Total, F1), by = c("AnonID", "year")) %>% 
  filter(!is.na(diagtest_score))

We have both course results and diagnostic test scores for the following number of students:

course_results_vs_diagtest %>% 
  select(AnonID, year) %>% 
  distinct() %>% 
  tally()
## # A tibble: 1 × 1
##       n
##   <int>
## 1  2509

4.1 Specialist courses

Mathematics students take linear algebra (ILA) in semester 1, then calculus (CAP) and a proofs course (PPS) in semester 2.

course_results_vs_diagtest %>% 
  filter(course_type == "spec") %>% 
  janitor::tabyl(year, course) %>% 
  gt()
year CAP ILA PPS
2013-14 286 302 170
2014-15 313 347 177
2015-16 236 257 168
2016-17 280 300 179
course_results_vs_diagtest %>% 
  filter(course_type == "spec") %>% 
  mutate(course = fct_relevel(course, "ILA", "CAP", "PPS")) %>% 
  ggplot(aes(x = diagtest_score, y = mark)) +
  geom_point(size = 0.8, stroke = 0, alpha = 0.5) +
  geom_smooth(method = lm, formula = "y ~ x") +
  ggpubr::stat_cor(label.y = 105, p.accuracy = 0.001) +
  facet_grid(cols = vars(course)) +
  theme_minimal() +
  theme(strip.text.x = element_text(size = 12)) +
  labs(x = "Edinburgh MDT score", y = "Course result")

ggsave("output/uoe_pre_regression-spec.pdf", units = "cm", width = 16, height = 8)

This shows that the diagnostic test is moderately predictive of success in year 1.

  • Note that the correlations are lower in semester 2 courses; this is to be expected, as the information from the diagnostic test becomes less relevant as students experience more teaching during their courses.
  • The correlation in PPS is lower than in CAP. Again this is as we would expect, since much of PPS is concerned with writing proofs and this is not something addressed by the diagnostic test.

4.2 Non-specialist courses

On the non-specialist side, students take the following courses:

  • MSE - for engineering and chemistry students, discontinued after 2014-15,
  • EM - for engineering students
  • MNS - for chemistry students

The courses come in two parts: 1a in semester 1, and 1b in semester 2.

course_results_vs_diagtest %>% 
  filter(course_type == "nonspec") %>% 
  janitor::tabyl(year, course) %>% 
  gt()
year EM1a EM1b MNS1a MNS1b MSE1a MSE1b
2013-14 0 0 0 0 316 310
2014-15 0 0 0 0 364 351
2015-16 231 215 61 57 0 0
2016-17 222 223 63 59 0 0
course_results_vs_diagtest %>% 
  filter(course_type == "nonspec") %>% 
  separate(course, into = c("course", "semester"), sep = "\\d") %>% 
  mutate(semester = ifelse(semester == "a", "Semester 1", "Semester 2")) %>% 
  mutate(course = fct_relevel(course, "MSE", "EM", "MNS")) %>% 
  ggplot(aes(x = diagtest_score, y = mark)) +
  geom_point(size = 0.8, stroke = 0, alpha = 0.5) +
  geom_smooth(method = lm, formula = "y ~ x") +
  ggpubr::stat_cor(label.y = 105, p.accuracy = 0.001) +
  facet_grid(rows = vars(semester), cols = vars(course)) +
  theme_minimal() +
  theme(strip.text.x = element_text(size = 12)) +
  labs(x = "Edinburgh MDT score", y = "Course result")

ggsave("output/uoe_pre_regression-nonspec.pdf", units = "cm", width = 16, height = 16)

We see a similar pattern of the diagnostic test being moderately predictive of Semester 1 results, and still somewhat predictive in Semester 2.

About this report

This report supports the analysis in the following paper:

[citation needed]

Packages

In this analysis we used the following packages. You can learn more about each one by clicking on the links below.

  • mirt: For IRT analysis
  • psych: For factor analysis
  • tidyverse: For data wrangling and visualisation
  • reshape: For reshaping nested lists
  • vroom: For reading in many files at once
  • broom: For tidying model output
  • fs: For file system operations
  • gt: For formatting tables
  • knitr: For markdown tables
  • ggrepel: For labelling points without overlap
  • skimr: For data frame level summary
  • ggridges: For ridge plots